Optimal Canonization of All Substrings of a String
نویسندگان
چکیده
Any word can be decomposed uniquely into lexicographically nonincreasing factors each one of which is a Lyndon word. This paper addresses the relationship between the Lyndon decomposition of a word x and a canonical rotation of x, i.e., a rotation w ofx that is lexicographically smallest among all rotations ofx. The main combinatorial result is a characterization of the Lyndon factor of x with which w must stan. As an application, faster on-line algorithms for finding the canonical rotation(s) of x are developed by nontrivial extension of known Lyndon factorization strategies. Unlike their predecessors, the new algorithms lend themselves to incremental variants that compute, in linear time, the canonical rotations of all prefixes of x. The fastest such variant represents the main algorithmic contribution of the paper. It performs within the same 3lxl character-comparisons bound as that of the fastest previous on-line algorithms for the canonization of a single string. This leads to the canonization of all substrings of a string in optimal quadratic time, within less than 31x12 character comparisons and using linear auxiliary space.
منابع مشابه
Minimum Unique Substrings and Maximum Repeats
Unique substrings appear scattered in the stringology literature and have important applications in bioinformatics. In this paper we initiate a study of minimum unique substrings in a given string; that is, substrings that occur exactly once while all their substrings are repeats. We discover a strong duality between minimum unique substrings and maximum repeats which, in particular, allows fas...
متن کاملEecient Approximate and Dynamic Matching of Patterns Using a Labeling Paradigm
A key approach in string processing algorithmics has been the labeling paradigm KMR72], which is based on assigning labels to some of the substrings of a given string. If these labels are chosen consistently, they can enable fast comparisons of substrings. Until the rst optimal parallel algorithm for suux tree construction was given in SV94], the labeling paradigm was considered not to be compe...
متن کاملWeak Repetitions in Strings
A weak repetition in a string consists of two or more adjacent substrings which are permutations of each other. We describe a straightforward (n 2) algorithm which computes all the weak repetitions in a given string of length n deened on an arbitrary alphabet A. Using results on Fibonacci and other simple strings, we prove that this algorithm is asymptotically optimal over all known encodings o...
متن کاملThe Efficient Computation of Complete and Concise Substring Scales with Suffix Trees
Strings are an important part of most real application multivalued contexts. Their conceptual treatment requires the definition of substring scales, i.e., sets of relevant substrings, so as to form informative concepts. However these scales are either defined by hand, or derived in a context-unaware manner (e.g., all words occuring in string values). We present an efficient algorithm based on s...
متن کاملA New Family of String Classifiers Based on Local Relatedness
This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr’s), longest common subsequences (LCSeq’s), and window-accumulated longest common subsequences (wLCSeq’s). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set)...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inf. Comput.
دوره 95 شماره
صفحات -
تاریخ انتشار 1991